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ABSTRACT 


Least-squares  estimates  of  regression  coefficients 
are  extremely  sensitive  to  large  errors  in  even  a 
single  data  point.  Frequently,  an  ad-hoc  procedure  is 
used  to  weight  the  data  in  a  manner  to  alleviate  the 
effects  of  extreme  observations. 

This  thesis  is  a  study  of  the  effectiveness  of  an 
iterative  regression  method  using  weights  derived 
through  maximum-likelihood  arguments.  Actual  weights 
are  calculated  on  the  assumption  of  Cauchy-distributed 
error  as  a  worst-case  situation  in  which  the  errors 
have  long,  fat  tails  and  no  finite  moments. 
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I.   INTRODUCTION 


A.   LEAST-SQUARES  LINEAR  REGRESSION 


It  is  often  desirable  to  model  the  behavior  of  a 
response  variable  as  a.  function  of  another  variable, 
sometimes  referred  to  as  a  "carrier",  since  it  carries 
information  about  the  dependent  variable.  In  the  simplest 
case,  the  equation 

y   =  b   +  b  x 
i    o     i  i 

is   fitted   to    a  set    of    data    points    (x    ,  y   )  .       Usually    this      is 

i   i 

done   using   the  "least-squares"  procedure  which  selects  the 

coefficients  b   and  b   that   minimize   the   sum   of   squared 
01 

residuals,  r  ,   defined  as 
i 

A  A 

r   =  y   -b   -bx 
i    i     o     i  i 


The  procedure  is  based  on  the  linear  model 

y  =b   +bx   +6 
i     o     i  i     i 

where      the      6        are      independent   and   identically   distributed 
i 

random    variables    with    mean   zero  and  constant      (but      unknown) 
variance.         Then,    by    the   Gauss-Markov    Theorem,    the    estimates 


b   and  b   found  by  solving  the  "normal  equations" 
o       1 


IE  y   =nb    +   b  I  x 
i      o       1    i 


>  x  y   =  b  >  x   +  b  >  x2 

i  i     o    i     i    i 

are   the    best     (minimum    variance)     linear    unbiased    estimates   of 

b      and   b    . 
o  1 


B.       DEFICI2NCIES    OF    LEAST-SQUARES 


the      least-squares   procedure    works    very    well    when    the    € 

i 

are  short-tailed  and  the  other  assumptions  about   the   error 

distribution   hold.   If,  however,  the  error  distribution  has 

very  long  tails,  implying  that  extreme  observations  may  well 

occur,  least-squares  quickly  demonstrates  its  sensitivity  to 

large  random  error.   In  real  data,  the  analyst   very   rarely 

has   a  hint  as  to  the  nature  of  the  true  distribution  of  the 

6  .   Heuristic  arguments   appealing   to   the   central   limit 
i 

theorem   are   frequently   made  alonq  the  line  that  there  are 

several  sources  of  variability  in  the  data,  which,  in  the 
aggregate,  will  be  "normally"  distributed  and  thus  suitable 
for  least-squares.  Unfortunately,  if  any  of  the  errors  are 
long-tailed  (such  as  may  be  described  by  the  Cauchy 
distribution),  then  their  aggregate  effect  will  not  be 
normal. 


Figure  10  and  Figure  11  in  the  Appendix   are   histograms 

of  least-squares  estimates  for  b   and  b   in  the  linear  model 

o       1 


y   =  22  +  2  (x  -x)  . 

i  i 

The  e     for  these  estimates  came  from  a  Normal,  or   Gaussian, 

i 

distribution,   for   which   the   least-squares   procedure   is 

optimal.  Figure  12  and  figure  13  show  the  effects  of 
Cauchy-distributed  error  on  the  estimates  of  these 
coefficients.  The  end  cells  contain  points  which  would 
otherwise  be  off  the  scale  of  the  histogram,  and  emphasize 
that  large  errors  in  estimating  the  coefficients  are  quite 
possible  when  using  least-squares.  A  uniform  distribution's 
adverse  effect  upon  the  coefficient  estimates  is  shown  in 
Figure  14  and  Figure  15. 

A  function  of  Cauchy  variates, 

C/100 

1    =  Ce 

is  the  error  density  associated  with  the  widely- varying 
coefficient  estimates  histogrammed  in  Figure  16  and  figure 
17.  This  distribution  is  virtually  symmetric  between  ±12, 
but  has  a  long  tail  extending  toward  +oo  .  Another 
distribution  of  error,  a  function  of  normal  variates, 

N+0.01N2 
Z  =  Ne 


has  high  positive  skewness,  a  little  bias,   and   an   adverse 
effect   upon   the   least-squares 

shown  in  Figure  18  and  Figure  19. 


effect   upon   the   least-squares   estimates  for  b   and  b   as 

o  1 


All    of    these    cases   demonstrate    that   the    variances    of    the 


10 


coefficient  estimates  may  be  drastically  increased  by  the 
presence  of  non-gaussian ,  and  especially  long-tailed, 
distributions  of  error.  While  the  bulk  of  the  estimates  do 
indeed  fall  near  the  actual  values,  there  is  clearly  an 
unacceptable  probability  of  obtaining  an  extreme  estimate 
when  using  the  least-squares  procedure. 


C.   USE  OF  THE  CAUCHY  DISTRIBUTION 


Data  disturbed  by  Cauchy-distributed  error,  with  long, 
thick  tails  and  lack  of  finite  moments,  may  be  considered  an 
extremely  difficult  case  for  regression  techniques  to  treat 
reliably.  A  procedure  that  works  well  for  data  subjected  to 
such  extremely  straggly-tailed  errors  can  reasonably  be 
expected  to  work  well,  though  not  necessarily  optimally,  in 
many  curve-fiting  situations.  This  thesis  uses 
maximum-liklihood  estimates  for  regression  coefficients  to 
develop  a  robust  regression  procedure,  then  further  assumes 
a  Cauchy-distributed  error  to  apply  a  specific  technique  to 
a  series  of  controlled  regression  problems. 
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II.        SINGLE-CARRIER    ROBUST    REGRESSION 


A.        MAXIMUM-IIKELIHOOD    ESTIMATORS 


The      procedure      to   be   presented   is    based    upon    the    linear 
model 

y      =   b      +   b    (x  -x)    +    e 
i  o  *       i  i 

The        are   assumed    to   be   independent,   identically 
i 

distributed  random  variables  centered  at  zero  with  spread 
(a  scale  parameter)  and  having  a  density  of  the  form 

Hr) 

The   probability   for   any   single   observation   y^   may   be 

i 

expressed  as 


fy   -  b  -  b  (x  -x)\  1 
P  (T  =  y  )   =   f[  i    °    *   i     -  dy 
1        l        l  I 
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The  likelihood  function  for  n  observations  is  the  product  of 
n  of  the  above  probilities.  Taking  logarithms,  the 
log-likelihood  function  is  then 


(VV* ]    =   - ln    * 


y  -  b  -  b  (x  -x) 
i    o    1   i 


-   n  ln  { 


Partial    derivatives   are    taken    with   respect   to    b    ,    b      and 

o         1 

F    ,  and  all  set  eaual  to  zero  to  find   the   b    ,b    and    I 

s  0      1 

which  maximize  L.   Using  r   above,  (p  (x)    is  defined  as 

i 


(p(x) 


f  ' 
f 


6f 


In   f  (x)     , 


the      three      equations      obtained      from      setting      the      partial 
derivatives    of   L   to   zero   may    be    written   as 


*</, 


fr    \/y   -    b  -    b    (x    -x! 


=      0 


/r  \   /y   -   b  -    b    (x  -x)\  ^  1  n 

//r'    ill    i 2 1 i (y.~    b   -    b     (x.-x))       -        +      -      =      0 

r  I       l        o         l      l  £2  Z, 


i 
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This   system   of   non-linear   equations   may   be   solved 

iteratively.   Defining  w     as 

i 


where   the   superscript   (j)   refers   to    the   number   of 

th 
iterations,  The  equations  at  the  j    iteration  are 


A  A 


_        (J) 

>  w    (y  -  b  -  b  (x  -x)  )  =  0 
i     i    o    1   i 


(3) 
2   w    (x  -x) (y  -  b  -  b  (x  -x) )  =  0 

i     i     i    o    l      i 


JL  _   (j)r  CM)  12 
F2.      =   /  .    n  ^  w.   [r.     ]z 

I)]       b  3-  1       1     1 


The  first  two  equations  are  simply  weighted  least-squares 
normal  equations  which  may  be  solved  by  standard  iterative 
weighted  least-squares  algorithms  in  which  the  weights  for 
each   subsequent   iteration   are   calculated   from  the  above 

expression  for  w 

i 

Assuming   the   error  to  be  Cauchy-distributed,  the  weighting 

formula  becomes 


(j)       s      /    /  (j-1) 

w      = 
i 


2 


»  1-1  [1  +  /  J^ 

1 1-, 
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B.   INITIAL  ESTIMATES 


It  is  necessary  to  begin  the  iterative  process  .with  an 
initial  estimate,  or  guess,  of  the  values  of  the 
coefficients.  A  robust  estimate  suggested  by  D.  F.  Andrews 
[1]  using  the  median  provides  an  estimate  which  is 
insensitive  to  arbitrarily  large  disturbances  in  up  to  25% 
of  the  data 


The  first  coefficient,  b   (which  corresponds  to  the  mean 

o 

of  the  y   in  least-squares  estimation)  is  estimated   by   the 
i 

median  of  the  y  : 

i 


0      1 


Next,  the  carriers,  (x  -x)  ,  are  ordered  and  then   broken   up 

i 

into   three   groups  of  approximately  equal  size.  Of  interest 


are  the  upper  group  of  carriers,  x  ,  the  lower  carriers,  x  , 

U  L 

and  the  y   corresponding  to  the  (x  -x)  in  each  group  (y   and 
i  i  u 

V- 

The   estimate  for   b   is   a   rough  slope  computed  from  the 
medians  of  the  four  groups: 


t       i 

y   -  y. 

u 
b 


i       i 

x    -   X 


u 
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The   median   of   the   absolute   values  of  the  residuals  from 
these  estimates  is  the  initial  guess  for   £  : 

I      =  median  |y   -  b   -b  (x  -x) | 
jo  i     o    1   1 

o 
(1) 

Weights  w    are  calculated  from  the  residuals  and   £  .   The 
J  ° 

algorithm  then  proceeds  until  the  values  of  the  coefficients 
stabilize. 


C.   SUMMARY  OF  PROCEDURE 


1 •   Overall  Effect 

Figure  1  is  a  typical  scatterplot  of  data  which 
includes  extreme  observations,  or  "outliers",  and  sketches 
of  representative  least-squares  and  robust  fits.  The  effect 
of  the  weighting  procedure  is  to  pull  the  extreme 
observations  in  closer  to  the  bulk  of  the  data,  reducing 
their  tendency  to  distort  the  fit  (note  least-squares  line) . 
It  should  be  noted  that  both  the  response  variable  and  the 
carriers  are  weighted  in  this  technique. 
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Least-squares  fit 


Robust  fit 


Figure  1  -   FITS  WITH  OUTLIERS 
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2  •   Solution  Not  Unique 

There   are   cases   in  which  the  robust  procedure  may 

not   converge   to   a   single   global   solution.    Since  the 

solution   to   the   weighted  normal  equations  is  actually  the 

solution  to   the   three   non-linear   equations   obtained   by 

setting   the   partial  derivatives  of     equal  to  zero,  there 

exists  the  possibility  of  converging  to  a  local  solution  not 

optimizing   b    and   b  .    Figure  2  is  an  example  of  a  local 
01 

solution.   The  scatterplot  represents   data   which   actually 

has  two  seperate  means  (the  data  might  be  drive-in  movie 
attendence,  where  observations  were  made  only  on  Wednesdays 
and  Saturdays) .  A  least-squares  fit  approximately  splits 
the  two  groups  of  points  as  indicated.  A  robust  fit  may 
also  split  the  data,  but  could  converge  to  one  of  the  two 
clusters  if  either  is  sparse  enough  to  cause  the  weighting 
process  to  treat  its  points  as  outliers. 
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Robust   fit 


Least  -  squares  fit 


Figure  2  -   LOCAL  SOLUTION 
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3 .   Algorithm 

The  following  flow  chart  depicts  the  algorithm  for 
the  Cauchy-weighting  regression  method.  The  criteria  for 
convergence  (change  in  both  coefficients  of  less  than  0.01% 
from  one  iteration  to  the  next)  was  somewhat  arbitrary,  but 
was  set  to  meet  practical  expectations  inanalysis  problems 
and  not  consume  excessive  amounts  of  computer  time. 


INITIAL  GUESS 

*(0)        *(0) 
b     and   b 
o  1 

J  =  1 


V 


CALCULATE  r 


(j-1) 


•>  1-1    l 


(J) 


A 


V 


SOLVE  WEIGHTED  NORMAL  EQUATIONS 

b     and   b 
o  1 


YES 


> 


Figure  3  -   ALGORTIHM  FLOWCHART 
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D.   INADEQUACY  OF  R2 


One  of  the  measures  of  adequacy  of  fit  for  least-squares 
regression  is  R2,  the  amount  of  variance  explained  by  the 
regression.   It  is  the  ratio 


R2   = 


£  (y.  -  y)2 

i 

2  (y.  -  y)2 

i 


where 


y  =  b   +  b  (x  -x)  . 
i     o     i   i 


For  least-squares,  R2  is  a  fraction  between  0  and  1,  but  for 
a  robust  procedure,  the  above  ratio  may  exceed  1.  This 
occurs  when  the  robust  fit  is  "farther"  from  the  mean  of  the 
data   than   the   least-squares    fir. 

Consider   the    following    set    of   observations. 

y  3.75       6.00       7.00       8.00       10.25 

x  1.00       2.00       4.00       6.00         7.00 

The    mean   of    the    y      is   7.00    and   a    least-squares  fit      of      the 

i 

model    y  =  b   +  b  x     to    the   data   yields    b   =  3.385, 
0      1  o 

b   =  0.094  and  R2  =  0.919.   A  robust  fit   would   reduce  the 
i 

effects  of  observations  (2.00,6.00)  and  (6.00,8.00)  since 
they  lie  somewhat  off  the  line  through  the  other  three 
points.  A  robust  procedure,  bringing  these  "extreme" 
observations  in  closer  to  the  rest  of  the  data,   might   well 


21 


yield    coefficients    of    b      =    3.000    and    b      =    1.000.       From    these 

01 

coefficients  and  the  data,  R2  =  1.124.  Figure  4  is  a 
scatterplot  of  the  observations  with  drawings  of  the  actual 
least-squares  fit  and  the  postulated  robust  fit.  Note  that 
the   two  fits  are  very  close,  but  more  importantly,  that  the 

robust  fit  is  rotated  so  that   (y  -  y)  2   for  the  robust  fit 

i 

is   everywhere  greater  than  or  equal  to  the  same  measure  for 

the  least-squares  fit. 
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Figure   4  -   SITUATION  IN  WHICH  R2  EXCEEDS  1 
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Mors  generally,  as  in  Figure  1 ,  R2  as  calculated  above 
is  small  due  to  the  large  deviations  from  the  mean  caused  by 
outliers.  When  a  response  variable  has  only  a  single 
carrier,  a  plot  of  the  data  and  the  fitted  line  provide  a 
visual  evaluation  of  the  fit.  In  multivariable  cases,  it  is 
usually  impossible  to  plot  the  data  meaningfully,  and  the 
good  fit  of  the  robust  line  to  the  bulk  of  the  data  could  be 
belied   by  inappropriately  using  R2  as  a  measure  of  the  fit. 
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III.   EXPERIMENTAL  PROCEDURE  FOR  A  SINGLE  CARRIER 


A.   GENERAL  DESCRIPTION 


A  "true"  model 


y   =  22  +  2  (x  -x)  +  e 

i  i        i 

was    established    to   enable   comparisons   of   the   Cauchy 

weighting  technique  and   least-squares.    The   x    were   the 

i 

integers   1   through   20,   and  random  variates  were  selected 

from  one  of  five  controlled  error  distributions   to   produce 

20   observations   of   the  y  .   The  y   were  then  regressed  on 

i        i 

the  (x  -x)  to  obtain  estimates  for  b   and  b   which  could   be 
i  o       i 

compared  to  the  actual  values. 


One    thousand    replications    were    made    for    each 
distribution   and   each  method.   Histograms  were  constructed 

A  A 

for   both   b    and   b    to   reveal    their    distributions. 
o  1 

Preliminary   runs   indicated   that   most   problems  converged 

(both  coefficients  changed  less  than  0.01%  from  one 
iteration  to  the  next)  within  10  iterations.  To  reduce  the 
amount  of  time  to  perform  the  experiment,  the 
Cauchy-weighting  iterations  were  terminated  no  later  than 
the  seventh  iteration.    Values   of   the   coefficients   were 
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recorded  at  the  fourth  iteration  to  see  if  there  were 
significant  changes  between  that  and  the  final  iteration. 
If  the  problem  converged  early,  data  normally  collected  at  a 
later  point  were  assigned  the  stabilized  values. 


B.   ERROR  DISTRIBUTIONS 


Five  controlled  error  distributions  were  used  to  disturb 

the  observations.    The   first,   the   Gaussian  or   "Normal" 

distribution   with   mean   zero   was  matched  to  the  second,  a 

Cauchy  distribution.    This   was   done   by   integrating  the 

th 
Cuachy   density   centered  at  zero  to  find  the  75    guantile, 

giving  1  as  a  measure  of  the   spread   of   the   distribution. 

th 
Since  the  corresponding  Normal  (0,1)  75    guantile  is  0.6745, 

a  Normal  distribution  with  standard  deviation  1.4826  will 
have  the  same  interguartile  range  as  a  Cauchy  distribution 
with  spread  parameter  1.  The  third  source  of  error  was  a 
uniform  distribution  with  mean  zero  and  variance  matched  to 
the  above  Normal,  giving  it  a  range  of  -13.1886  to   13.1886. 

The  "V"  density  is  a  function  of  Cauchy  variates  C: 

C/100 

V  =  Ce 

It  is  positively  skewed,  but  virtually  symmetric  between  -18 
and  +18  with  a  very  pronounced  central  spike.  Figure  5  is  a 
histogram  of  2000  "V"  variates. 
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The      final   test   density    is    a   function    of   Normal    variates 
N    with    mean    zero   and    variance    1. 

N+0.01N2 

Z    =    Ne 

It    is   positively   skewed  and    slightly   biased.      A   histogram   of 
2000    "Z"    variates    is    shown    in   Figure   6. 
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IV.   RESULTS  OF  SINGLE-CARRIER  EXPERIMENT 


LEAST-SQUARES  ADVANTAGES 


Summary   statistics   for   the  distributions  of  b   and  b 

01 

for  the  single-carrier  experiment  are  shown  in  Figures  7  and 

8.  Looking  at  means  and  standard  deviations,  least-squares 
estimates  (maximum-liklihood  estimates  for  normal-error 
data)  are  better  for  normally-distributed  cases  than  the 
Cauchy  method.  Interestingly,  least-squares  is  also 
noticeably  better  when  the  error  comes  from  a  uniform 
distribution.  This  result  could  be  explained  by  the 
relatively   broad  area  in  which  the  data  points  may  fall  for 

the  uniform  error  with  respect  to  the  range   of   the   (x  -x) 

i 

used,  and  the  susceptibility  of  the  Cauchy  method  to 
convergence  to  local  solutions. 
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Fiqure  7  -   SUMMARY  OF  SINGLE-CARRIER  b   DISTRIBUTION 
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Fiqure  8  -   SUMMARY  OF  SINGLE-CARRIER  b   DISTRIBUTION 
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Figure  9  is  a  diagram  of  the  region  in  which  data 
points  may  fall  when  the  error  is  uniform  between 
±13.1886  .  Since  the  observations  may  lie  anywhere  in  the 
region  with  egual  probability,  the  weighting  process  may  not 
be  able  to  clearly  discriminate  which  points  are  outliers. 
Chance  alignments  of  a  series  of  points  could  determine  a 
local  optimum  upon  which  the  Cauchy-likelihood  method  would 
converge.  While  other  distributions  have  longer  tails,  the 
bulk  of  their  variates  fall  within  a  relatively  small 
distance  of  their  center,  better  defining  a  mean  trend  and 
clearly  differentiating  outliers  from  the  rest  of  the 
observations. 
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y  =  22  +  2(x.-x] 


Figure   9  -   EQUALLY-LIKELY  REGION  FOR  UNIFORM  DATA 
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B.   CAUCHY  METHOD  ADVANTAGES 


Comparing  means  and  standard  deviations  for  the  two 
methods,  the  Cauchy- likelihood  procedure  is  clearly  the  more 
reliable  technique  when  the  errors  have  long  tails.  Maximum 
and  minimum  estimates  of  the  coefficients  are  closer  to 
their  true  values  when  the  Cauchy  technique  is  used,  and  it 
never  produces  extreme  estimates.  There  is  little 
difference  in  the  estimates  from  the  fourth  to  the  seventh 
iterations. 


C.   SIMILARITIES 


The   means   and   medians   for   both   methods    are   not 

significantly   different.   The  coefficient  estimates  between 

th       th 
the-  25    and  75    quantiles  are  virtually  the  same  over   all 

distributions,  the  Cauchy-based  method  doing  better  for  the 
long-tailed  distributions  and  the  normal  having  some 
advantage   principally   when   the   error   is   uniform.   Even 

th         th 

between  the  10    and   90     quantiles,   the   two   procedures 

yield  very  similar  results. 
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V.   REGRESSION  ON  TWO  INDEPENDENT  CARRIERS 


A.   MULTIVARIATE  MODEL 


For   two   independent   carriers,   the   linear   model   is 
assumed  to  be  of  the  form 

y  =  b   +  b  (x   -x  )  +  b  (x  -x  )  +  e 

i     o     i v  i1   1      2 li2   2      i 

The   regression   can    be    expressed    in   matrix    terms   as 

I    =    XB    +    E. 

Y  is  an  nx1  matrix  of  n  response  variable  observations,  X  is 
an  nxp  matrix  having  all  1's  in  the  first  column,  the  n 
observations  of  the  first  carrier  in  the  second  column,  and 
the  n  observations  of  each  of  the  remaining  p-2  carriers  in 
each  of  the  remaining  columns.  B  is  a  px1  matrix  of 
coefficients,  b  ,b  ,b  ,...b    ,   and  E  is  an  nxn   matrix   of 

0    1    2        p-1 

unknown  random  errors,  independent  and  identically 
distributed,  centered  at  zero  and  having  constant  spread. 

Using   a   prime   (•)   to   designate   the   transpose  of  a 
matrix,  the  least-squares  normal  equations  are 

X«Y  =  X»XB   . 


The  weighted  normal  equations  are 

(WX)  •  Y  =  (WX)  'XB 

where  W  is  an  nxn  matrix  having  as  its  diagonal  elements  w 


n 


th  (k) 

the   k    iteration  weights,  w    ,  and  zeros  elsewhere.   The 

i 

weighted  normal  equations  are  equivalent  to  a   system   of   p 


equations   in  p  unknowns  (the  b  ,  i  =  0, 1 , . . . ,p-1)  which  can 

i 

be  written  as 


0  =  VB 

and  is  easily  solved  for  B. 

B.   MODIFICATION  TO  INITIAL  ESTIMATION  PRODEDURE 


Multiple-carrier  regression  problems  require  a 
modification  to  the  initial  estimate  procedure  to  ensure 
that  any  interdependence  among  the  carriers  is  removed  prior 
to  estimating  the  effects  of  the  carriers  on  the  response 
variable.  D.  F.  Andrews  [1]  has  suggested  a  rather 
time-consumming  method  applying  a  robust  sweep  operator  to 
the  columns  of  the  X  matrix  in  an  iterative  process.  An 
alternate  method  inspired  by  Mosteller  and  Tukey  [6] 
sequentially  regresses  the  carriers  on  each  of  their 
predecessors   in  the  X-matrix  to  eliminate  unwanted  effects. 
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Multiple  regression  may  be  viewed  as  a  sequence  of 
single-carrier  regressions  in  which  the  dependent  variable, 
y,  is  regressed  on  the  first  carrier  alone  according  to  the 
model 

y  =  y   x   +  residual 
1  1 

Let   y  ("y   adjusted    for    x    )       be      the      residual      after      the 

;1  1 

effects    of    x      are   removed: 
1 

y        =    y   -       V  x 
;  1  11 

This  residual  is  set  aside  while  the  effect  of  x   on   x   is 

1       2 

removed  using  the  model 


x   =  d    x   +  x      , 
2    2;  1  1     2;  1 


x  being      "x        adjusted      for      x    "    .         The      residual      of      y 

2;1  2  1 

A 

(  y    )  is  then  regressed  on  x    to  find  b   in  the  model 
;1  2;1  2 


y    =  b  x     +  residual 
;1     2  2;1 


Substituting  for  y    and   x   ,       b   =  (  v  -  d    b)so 

;  1  2;1  1  1        2  ;  1   2 

that 


y    =    bx      +bx      +   residual 

11       2  2 


For  a  model  having  a  mean  effect 


y   =  b   +  b  (x  -x  )  +  b  (x   -x  )  +  residual 
i     o     i   i1   1      2   i2   2 
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In   practice,   b    is  found  immediately,  (from  the  median  of 

o 

the   y  ,   as   before)   and   its   effects   not   removed    for 
i 

computational  considerations.   It  is  not  important  to  remove 

the  mean  effect  since   it   is   independent   of   the   carrier 
effects  that  must  be  removed. 


Estimates  for  Y      are  found  using   the   median   estimate 

i 

described  above: 


A 

V. 


a  u    a  1 


x   -   x 
a  ui    a  li 


where  the    subscript  indicates  the   quantities   have   been 
a 

adjusted   for   all   preceeding   carriers.    For  example,  the 


estimate  for  V     would  require   that   v    and   x     both   be 

3  "i        i3 

adjusted   for  the  effects  of  carrier  x   and  carrier  x    .   A 

1  2;1 


similar    procedure    finds   the    d  coefficients      for      the      j 


th 


carrier  regressed  en  its  predecessors.   The  Y     and  d    may 

i       j  ;a 

A 

then  be  arranged  in  a  system  of   equations   in   the   b   and 

i 

subsequently   solved   to   yield   the   coefficients   for   the 
desired  model. 
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TWO-CARRIER    EXPERIMENT 


The 


twc-carrier   experiment    was   analagous   to   the 


one-carrier  tests.   Coefficients  b  ,  b   and  b   were  fixed  to 

0     1         2 

establish  a  known  model 


y   =  13  +  3  (x   -x  )  -  0.5(x   -x  ) 
i  i1   1         i2   2 

The  x    were  the  integers  1  through  20  in   ascending   crder; 
i1 

the   x     were   the   same   integers   shuffled   to   establish 
i2 

independence  in  the  X-matrix  columns.   The   "true"   y    were 

i 

then   calculated   and   subsequently   disturbed   by   the  same 

additive  error   e   as  in  the  one-carrier  case. 

i 


Since  the  single-carrier  experiment  showed  little  change 
in  the  values  of  the  coefficients  from  the  fourth  iteration 
to  the  seventh,  the  two-carrier  iterations  were  terminated 
after  four  cycles  (or  convergence)  for  each  of  1000 
replications  for  each  of  the  five  distributions.  Only  final 
values  were  recorded  since  the  initial  guesses  tended  to  be 
somewhat  unstable  in  the  first  experiment. 


D.   RESULTS  OF  THE  TWO-CARRIER  EXPERIMENT 


The  results  of  the  second  experiment  are  summarized  in 
Figures  20,  21  and  22  in  the  Appendix.  The  estimates  of  the 
coefficients  parallel  the  single-carrier  cases  exactly,  with 
the   exception   of   the   Cauchy   method   applied   to  uniform 
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disturbances.  The  standard  deviations  of  the  Cauchy 
estimates  for  the  coefficients  of  the  uniform-disturbed  data 
are  slightly  lower  than  in  the  single-carrier  case,  in 
contrast  to  the  general  trend  for  the  standard  deviations  to 
be  higher  for  the  two-carrier  problems.  While  there  seems 
to  be  some  interaction  between  the  carriers  which  raises  the 
standard  deviations  in  general,  the  use  of  two  carriers  may 
be  reducing  the  tendency  of  the  Cauchy  method  to  converge  to 
a  local  solution. 
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71.   CONCLUSIONS 


The  robust  method  developed  and  tested  in  this  paper 
demonstrates  extremely  stable  behavior  over  a  variety  of 
distributions  of  random  error.  Traditional  least-squares 
estimation,  on  the  other  hand,  is  subject  to  potentially 
large  errors  in  its  estimates  of  regression  coefficients. 
The  method  based  on  Cauchy- likelihood  weighting  has 
consistently  smaller  error  when  outliers  are  present  and 
only  slightly  larger  errors  (though  never  any  extreme 
errors)  when  the  error  distribution  is  closer  to  the  Normal 
distribution . 

The    Cauchy-likelihood   estimates   appear   to   be   very 

slightly  biased.   The  centers  of  the  y   tend  to  be  estimated 

i 

too   high,  while  the  slopes  of  the  carriers  are  consistently 

low.   Possibly,  if  the  experiment  were  run   again   with   the 

signs   of   the   error   terms   reversed,  the  apparent  biasing 

would  also  reverse  to  imply  that  the  procedure   is   robustly 

unbiased . 

There  are  two  drawbacks  to  the  Cauchy  method.  The  first 
is  its  reguirement  for  more  calculations  and  intermediate 
storage.  The  initial  estimates  of  the  coefficients  alone 
require  nore  computer  assets  than  least-squares  needs  for  a 
complete,  though  possibly  erroneous,  solution.  As  a  general 
rule,  the  robust  Cauchy-likelihood  procedure  requires  twice 
the  data  storage  capacity  and  five  to  six  times  as  long  to 
run  as  a  basic  least-squares  routine.  Clearly,  the  large 
reduction  in  risk  for  obtaining  seriously  inaccurate 
estimates  of  regression  coefficients  warrants  the  use  of  the 
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Cauchy,  or  seme  other  robust   procedure   in   every-day   data 
analysis,  even  with  the  increase  in  computer  requirements. 

The  other  problem  with  using  the  Cauchy-likelihood 
method  is  the  possible  convergence  upon  a  local  solution. 
It  should  be  noted,  however,  that  traditional  least-sguares 
will  also  produce  erroneous  results  when  used  under  the  same 
conditions  which  would  cause  the  Cauchy-based  technique  to 
stabilize  at  a  local  solution. 
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APPENDIX    A 


HISTOGRAMS    OF    LEAST-SQUARES    ESTIMATES 
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>ft^ft->ftftftftftftftftftftVftftftft-*-irft-v*ftft-»ftftftft-^ftftftftft%rftft-^ft-.>ft--rftft*ft 

-■;  j-.  <•.  a  -•  -•■  ---  a  -•<.  .•■  .;;.  a  ---  •-  ■:  a  ft  *  -:-  »  ■■;-  ft  -"-.ft  *  a  a  ft  -"-  ft  -••-  *  -*  a  -:-  a  -:-  -■•-  a  ft  a  #  ■:-  «■  •-  ;-  ."    !•  ■»**«■-*  a      — 

ft    .<.--.    ft    ft   ft   JS    ft   ft    ft   ft   ft    ft   ft    ft   ft   ft     i   ft   ^-    ft   <    ft   ft    ft   *   ft   ft    <    ft    ft   ft    ft   ft   ft   -ft  ft    ft    ft    ft    ft   ft   ft    ft    S.     ft   v     ft  ft    ft   ft    <•    ft    ft  • 

->  ft  <-  ft  ->  a  *  ft  ft  ■>»■  ft  »  a  ft  •»  ft  ft  ■*  ft  *-  •»  *  *  *  •*  -*  *  *  ft  -«•  *  «■  *  »  «  •»  ■*  <-  a  *  ■»  -a-  -»  «■  v  *  *  <-  *  *  ft  ft  i*  <-     r\, 

-;  ft  <   ft  <-  ft  ft  v-  ft  ft  *  .>  •*.-  ft  -r-  ■<•  ft  -<■  ft  ft  ft  -*  ft  ft  ft  ft  ft  ->  ->  ft  ft  ■**.  <  ft  ft  ft  ft  ft  ---  *  <-  a  %-  ^  ft  -„-  ft  ft  -a  ft  ft  ft  a  ft  ft 

frftftftftftftftftftftft^ftftftftft^ftft-wftftftftftft-rftftftftftftft'ftftftftftftv-ftftftftftftftftftftftft 

Vftftftftftftftftftftftftftftft-^ftv-ftftftftftftftftftftftftftftftftftftftftftftftftftftftftftftftftftftft* 

->  --  ft  ,-  ft  -,-  -;-  -x-  ft  *  ft  ■>>  ft  ft  -!■  ft  ft  a  ■>  v  ft  *  -.-  ft  *  -^  *  ->  •»  ft  -„-  -*  >  -<•  -s-  -rv  «  <-  ft  ft  ft  -.-  «■  ft  »  •»  ■»  ft  >  *-  a  ft  ■«■  %i 
-•-  :-  ft  --  ft  -a  •»•  a  ft  ft  -a-  ->  a  ;-  -_•■  ft  x  *  .-:  ft  <  «•  <-  i  ---  ft  -;-  ft  -t-  ft  -•  a  ft  ft  ft  ft  a  -  -x-  i;  ft  ft  *  ft  ft  ft  ft  ft  r  ft  a  <-  ft  • 
--  ft  -.--  ft  ft  ft  ft  ft  ft  ft  ft  ft  ft  ft  ft  ft  ft  ft  ft  *  ft  ft  ft  ft  ft  <■  ft  <  ft  <  ft  ft  ft  <■  ft  ft  -ft  ft  ft  ft  ft  -X   ft  ft  ft  ft  ft  -*   ft  ft  ft  ft  -:-       — • 

-  — ^> 

->  ft  *  «  ft  -.'-  ->  a  a  •*  *  ->  ■>;•  ft  <•  a  ft  »  *  ft  a  a  a  <-  a  ft  a  »  i*  -<■  -.<-  a  a  -a  a  »  ft  a  a  a  *  --- 

ft*ftft*ftaftftftftftftft**ftftft«ft->ftftftftftftft*ftft->ftftftftftft»ftft 
ft  i:-  a  ft  ft  a  -■--  <-r  ft  <-  a  ft  a  ft  *  <  a  *  *  ft  a  ft  a  a  a  ft  a  ft  ft  ft  -:-  a  *  ft  a  ft  -a  *  *  -x-  ft  a 

>  <-  -;■    ,-  ft  ft  ft  ft  ::-  <:•  >  -A  ■<*  >  ft  ft  a  •::-  ft  a  *  -:-  a  *  **  i-  a  a  ■*#«***  a  ->  -"-  a  ft  ft  »  a  a  a  <-  a  ft  ft  a  <■■  •«-      ,r 
---  a  a  *  a  «  a  a  *•  v-  ft  ft-  a  ft  a  ->  a  «■»*■*  ■:•  *•»«**  -,-  a  a  *  a  «■  *  ->  ft  *  -"-  *  ft  a  *  »  ft  ft  '-  -"»  ft  ft  ft  <■  <-        • 

-,.-  *;   ■:   :■  ;-  -.-   :-  -»  a  a  a  <  a  ->  a  a  a  a  •■»  a  »  a  a  >  a  a  a  -■«.  ,--   ;.  a  a  - "■  "■  .-  a  --«.  a  ft  a  a  a  a  a  a  a  a  <-  .-  a  a      3 

j >. 

ft  a  ->  ft  a  a  a  a  «■  a  a  -a  ft  a  -:-  a  a  -»  ft  -:-  a  a  *  ft  <■  --  ft  -"-■  a  a  --•  * 

ftVft-Xft^-*ftftftftftftftftftfrftftftft.ftftJi.ftftft-ftftftftft 

a  ft  a  ft  ft  ft  a  «  »  ft  -.:-  ---  a  a  a  a  »-.>»<-  a  -;,-  ft  a  J^  a  a  a  ft  a  ->  a 

ft  a    ;-  *-  -.'   >  ft  '?■  a  a  t  •"-  a  a  ■;■  a  -;-  a  a  a   '•  a       ^ 
«  *  *  »  a  a  a  ■>  a  ■*•  *  x  a  a  *  a  •<■  a  --.-*  a  a        • 
.  -..  a  ft  ss  :.-  ^-  ->  --    *  ft    ,-  ft  a  •-  a  <-  ft  a    >  ' 

->  ft  -  >  a  -  •  a  a  ■»  ft  ft  ft  ft  -*  a  ■>  ft 

a*ftftft»»ft*ft-»ftftftftftft 
a  -c-  a  -.-  ->  a  ft  -<?-  -v  ft  ft  -5-  <•  a  >?■  ft 

a  a  -.     -  a  ft  ft  ft  a  -»  -.r        c 

W   ->  ^    »■  >.    v   ft  "^  ft  -«•  ft  • 

ftft-s-ftftftft-w-ftftft         "C 

ft    V    >  ■*.-   ft    ft    ft   ft    ft   ft 

•*■    "V    *T  -*--    "W    ■>»-      ,-    >,     W    K 

>    --.-    ft  1.*    ft    -,-  -W  ft  ft  ft 

X    ft    ft    ft  _J 

-w   v   ■».-    ft  • 

ft  a  ft  ft       -^ 
a  ft  ft  a 

v   ■>*■   •*  ft 
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True  value  =  13 


Normal 

Cauchy 

Uniform 

ii  y  it 

"Z" 

Mean 

Cauchy 

Least-squares 

13.02 
13.00 

13.01 
298 

13.03 
12.99 

13.01 
1.4E9 

13.18 
14.8 

Std.  Dev. 

Cauchy 

Least-squares 

.417 
.  33  1 

.406 
9240 

2.71 
1  .69 

.405 
1.8E10 

.  358 
2.78 

Minimum 

Cauchy 

Least-squares 

1  1.66 
12.02 

11.52 
-4063 

4.68 
8.24 

11.49 

3.83 

12.  69 
12.86 

. 10  Quantile 

Cauchy 

Least-squares 

12.51 

12.59 

12.54 
9.  85 

9.54 
10.87 

12.55 
11.57 

12.83 
13.47 

.25  Quantile 

Cauchy 

Least-squares 

12.74 
12.77 

12.76 
11  .88 

11.  21 

11.78 

12.75 
12.34 

12.  93 
1  3.86 

.50  Quantile 

Cauchy 

Least-squares 

13.02 

13.00 

13.00 
13.02 

13.02 
12.97 

12.99 
13.25 

13.08 
14.40 

.75  Quantile 

Cauchy 

Least-squares 

13.31 

13.23 

13.27 

14  .04 

14.87 
14.  18 

13.26 

14.58 

13.32 
15.  16 

.90  Quantile 

Cauchy 

Least-squares 

13.56 
13.42 

13.52 
16  .15 

16.58 
15.21 

13.51 

18.54 

13.64 
16.  18 

Maximum 

Cauchy 

Least-squares 

14.35 

14.  04 

15.01 

29.22 

20.  18 
18.  30 

14.63 
2.3E1  1 

15.35 
89.  14 

Figure  20  - 


SUMMARY  OF  TWO-CARRIER  b   DISTRIBUTION 
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True  value   =    3 


Normal        Cauchy      Uniform  "V" 


M7.ll 


Mean 

Cauchy 

Least-squares 

2  .99 
3.00 

Std.  Dev . 

Cauchy 

Least-squares 

.071 
.  057 

Minimum 

Cauchy 

Least-squares 

2.73 
2.79 

.  1  0  Quantile 

Cauchy 

Least-squares 

2.89 
2.93 

.  25  Quantile 

Cauchy 

Least-squares 

2.94 
2.96 

.50  Quantile 

Cauchy 

Least-squares 

2.99 
3.00 

.75  Quantile 

Cauchy 

Least-squares 

3.03 
3.04 

.90  Quantile 

Cauchy 

Least-squares 

3.08 
3  .07 

Maximum 

Cauchy 

Least-squares 

3.24 

3.  18 

2.  99 
16.28 


.071 
354 


2.69 
-65.17 


2.90 
2.47 


2.95 
2.84 


2.  99 
3.00 


3.03 
3.  15 


3.08 
3.37 


3.35 
11  140 


2.96 
2.99 


.446 
.287 


1  .67 
2.00 


2.41 
2.63 


2.66 

2.80 


2.96 
2.99 


3.25 
3.18 


3.54 
3.37 


4.19 
3.85 


2.99 
1.2E7 

3  .00 
2.99 

.072 
2.7E9 

.  044 
.  271 

2.64 
-7E10 

2.77 
.  144 

2.90 
2.52 

2.95 
2  .77 

2.95 
2.86 

2.98 
2.91 

2.99 
3.00 

3.00 
3.00 

3.03 
3.12 

3.02 
3.08 

3.07 
3.37 

3.04 
3.20 

3 .  36 
3.97 

3.35 
6.29 
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56 


True  value  =  -0.5 


Normal   Cauchy   Uniform 


117  U 


Mean 

Cauchy 

Least-squares 

-.501 
-.502 

-.  501 
49.29 

-.5  09 
-.503 

-.501 
-1.3E8 

-.501 
-.502 

Std.  Dev . 

Cauchy 

Least-squares 

.070 
.055 

.069 
1576 

.435 
.283 

.069 
3.359 

.  044 
.44  3 

Minimum 

Cauchy 

Least-squares 

-.767 
-.665 

-.873 
-388 

-1.76 
-1  .29 

-.788 
-5E10 

-.802 
-3.77 

.10  Quantile 

Cauchy 

Least-squares 

-.591 
-.575 

-.585 
-.991 

-1  .08 

-.866 

-.587 
-.897 

-.547 
-.721 

.25  Quantile 

Cauchy 

Least-squares 

-.550 
-.536 

-.542 
-.663 

-.824 
-.682 

-.543 
-.650 

-.518 
-.589 

.50  Quantile 

Cauchy 

Least-squares 

-.499 
-.501 

-.499 
-.503 

-.499 
-.499 

-.499 
-.502 

-.501 
-.496 

.75  Quantile 

Cauchy 

Least-squares 

-.455 
-.468 

-.460 
-.368 

-.214 
-.317 

-.458 
-.367 

-.482 
-.414 

.90  Quantile 

Cauchy 

Least-squares 

-.415 
-.429 

-.418 
-.  109 

-.062 
-.  129 

-.418 
-.126 

-.457 
-.295 

Maximum 

Cauchy 

Least-squares 

-.295 
-.311 

-.  212 
49823 

.619 
.365 

-.234 
3.9E10 

-.294 
1  .07 

Fiqure    22    -       SUMMARY    OF    TWO-CARRIER    b       DISTRIBUTION 
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buted   random  error. 


thesM775 

Robust  regression  using  maximum-likeliho 


3  2768  001  01243  8 

DUDLEY  KNOX  LIBRARY 


